An English-Chinese Cross-lingual Word Semantic Similarity Measure Exploring Attributes and Relations
نویسندگان
چکیده
Word semantic similarity measuring is a fundamental issue to many NLP applications and the globalization has made an urgent request for cross-lingual word similarity measure. This paper proposed a word semantic similarity measure which is able to work in cross-lingual scenarios. Basically, a concept can be defined by a set of attributes. The basic idea of this work is to compute the similarity between words by exploring their attributes and relations. For a given word pair, we first compute similarities between their attributes by combining distance, depth and relation information. Then word similarity are computed through a combination scheme. The algorithm is implemented based on an English-Chinese bilingual ontology HowNet. Experiments show that the proposed algorithm results in high correlation against human judgments, which encourages its broad application in cross-lingual applications.
منابع مشابه
Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks
This paper suggests a method for detecting cross-lingual semantic similarity using parallel PropBanks. We begin by improving word alignments for verb predicates generated by GIZA++ by using information available in parallel PropBanks. We applied the Kuhn-Munkres method to measure predicateargument matching and improved verb predicate alignments by an F-score of 12.6%. Using the enhanced word al...
متن کاملChinese-English Bilingual Word Semantic Similarity Based on Chinese WordNet
Semantic similarity measurement of multilingual words is a challenging problem in data mining, information extraction, information retrieval, etc. This paper introduces an algorithm to measure the semantic similarity of Chinese-English bilingual words based on Chinese WordNet, an expansion of WordNet in Simplified Chinese. The algorithm not only measures the semantic similarity for Chinese and ...
متن کاملOWNS: Cross-lingual Word Sense Disambiguation Using Weighted Overlap Counts and Wordnet Based Similarity Measures
We report here our work on English French Cross-lingual Word Sense Disambiguation where the task is to find the best French translation for a target English word depending on the context in which it is used. Our approach relies on identifying the nearest neighbors of the test sentence from the training data using a pairwise similarity measure. The proposed measure finds the affinity between two...
متن کاملNTHU at NTCIR-10 CrossLink-2: An Approach toward Semantic Features
This paper describes the approaches of NTHU in the NTCIR-10 Cross-Lingual Link Discovery task, also named CrossLink-2. In this task, we aim to discover valuable anchors in Chinese, Japanese or Korean (CJK) articles and to link these anchors to related English Wikipedia pages. To achieve the objective, we do not only depend on Wikipedia’s distinguishing features (e.g. anchor links information an...
متن کاملCross-Lingual Syntactically Informed Distributed Word Representations
We develop a novel cross-lingual word representation model which injects syntactic information through dependencybased contexts into a shared cross-lingual word vector space. The model, termed CLDEPEMB, is based on the following assumptions: (1) dependency relations are largely language-independent, at least for related languages and prominent dependency links such as direct objects, as evidenc...
متن کامل